Allow the opensearch operator to watch multiple namespaces #1101

brouberol · 2025-09-17T14:50:24Z

Description

We allow the opensearch-operator to watch multiple namespaces.

We keep the original -watch-namespace flag, to ensure backwards compatibility. We simply split the value over any comma, and populate the cache for each namespace in the csv.

Note: Because the watchNamespace variable was being tested for emptiness before flag.Parse() was being called, it was always empty, causing the operator to always watch all namespaces in the cluster. This is no longer the case.

I have added documentation in the user guide as well as in the chart values.

Because this change occurs in main.go, for which we don't have unit tests, I'll enclose my manual test notes.

Testing

We first rebuild the operator binary.

~/code/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ make build
test -s /Users/brouberol/code/opensearch-k8s-operator/opensearch-operator/bin/controller-gen || GOBIN=/Users/brouberol/code/opensearch-k8s-operator/opensearch-operator/bin go install sigs.k8s.io/controller-tools/cmd/[email protected]
/Users/brouberol/code/opensearch-k8s-operator/opensearch-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go build -o bin/manager main.go

We ensure that the new behavior is now available.

~/code/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ ./bin/manager --help 2>&1 | grep -A 1 watch-namespace
  -watch-namespace string
    	The comma-separated list of namespaces that the controller manager is restricted to watch. If not set, default is to watch all namespaces.

We run the operator alongside a local minikube.

~/code/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ ./bin/manager -watch-namespace ns1,ns2 
{"level":"info","ts":"2025-09-17T16:34:39.456+0200","logger":"setup","msg":"Starting manager"}
{"level":"info","ts":"2025-09-17T16:34:39.457+0200","msg":"starting server","name":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2025-09-17T16:34:39.457+0200","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
...

We define a namespace-less cluster resource

~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ cat cluster.yaml

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: opensearch-cluster
spec:
  general:
    serviceName: opensearch-cluster
    version: '3'
  dashboards:
    enable: true
    version: '3'
    replicas: 1
    resources:
      requests:
        memory: "512Mi"
        cpu: "200m"
      limits:
        memory: "512Mi"
        cpu: "200m"
  nodePools:
    - component: nodes
      replicas: 3
      diskSize: "5Gi"
      nodeSelector:
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "cluster_manager"
        - "data"

We create 3 namespaces

~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create namespace ns1
namespace/ns1 created
~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create namespace ns2
namespace/ns2 created
~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create namespace ns3
namespace/ns3 created

We now create an opensearch cluster in ns1

~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create -n ns1 -f cluster.yaml
opensearchcluster.opensearch.opster.io/opensearch-cluster created

We start seeing activity in the operator logs

{"level":"info","ts":"2025-09-17T16:38:19.560+0200","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"ns1"},"namespace":"ns1","name":"opensearch-cluster","reconcileID":"4e9e94b2-8f25-4832-92bf-3a8e23349e3b","cluster":{"name":"opensearch-cluster","namespace":"ns1"}}
{"level":"info","ts":"2025-09-17T16:38:19.566+0200","msg":"Start reconcile - Phase: PENDING","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"ns1"},"namespace":"ns1","name":"opensearch-cluster","reconcileID":"4e9e94b2-8f25-4832-92bf-3a8e23349e3b","cluster":{"name":"opensearch-cluster","namespace":"ns1"}}
...

We now create an opensearch cluster in ns2:

~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create -n ns2 -f cluster.yaml
opensearchcluster.opensearch.opster.io/opensearch-cluster created

We start seeing activity in the operator logs, this time related to the cluster in ns2

{"level":"info","ts":"2025-09-17T16:41:40.313+0200","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"ns2"},"namespace":"ns2","name":"opensearch-cluster","reconcileID":"2b0e1d2a-9d9b-4cff-b52c-d3b32789b556","cluster":{"name":"opensearch-cluster","namespace":"ns2"}}
{"level":"info","ts":"2025-09-17T16:41:40.324+0200","msg":"Start reconcile - Phase: PENDING","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"ns2"},"namespace":"ns2","name":"opensearch-cluster","reconcileID":"2b0e1d2a-9d9b-4cff-b52c-d3b32789b556","cluster":{"name":"opensearch-cluster","namespace":"ns2"}}
...

We finally create a cluster in ns3:

~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create -n ns3 -f cluster.yaml
opensearchcluster.opensearch.opster.io/opensearch-cluster created

This time, no log related to the cluster in ns3 is observed in the controller logs.

Chart changes

I render the chart using the default values. The output does not contain the -watch-namespace flag.

~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯ helm template  . | grep watch-namespace
~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯

I then inject either a single or multiple namespaces to watch, either in a csv or in a list, to ensure that the rendering is correct:

~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯ helm template  . --set-json='manager.watchNamespace="ns1"' | grep watch-namespace
        - --watch-namespace=ns1
~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯ helm template  . --set-json='manager.watchNamespace="ns1,ns2"' | grep watch-namespace
        - --watch-namespace=ns1,ns2
~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯ helm template  . --set-json='manager.watchNamespace=["ns1"]' | grep watch-namespace
        - --watch-namespace=ns1
~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯ helm template  . --set-json='manager.watchNamespace=["ns1", "ns2"]' | grep watch-namespace
        - --watch-namespace=ns1,ns2

Issues Resolved

Closes #374

Check List

Commits are signed per the DCO using --signoff
Unittest added for the new/changed functionality and all unit tests are successful
Customer-visible features documented
No linter warnings (make lint)

If CRDs are changed:

CRD YAMLs updated (make manifests) and also copied into the helm chart
Changes to CRDs documented

Please refer to the PR guidelines before submitting this pull request.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

inflatador · 2025-09-26T19:47:09Z

@prudhvigodithi @swoehrl-mw Greetings! I'm an SRE with the Wikimedia Foundation and I work with @brouberol .

We're rolling out a new OpenSearch environment on K8s in the next month or so and I was wondering if y'all had the cycles to review this change? There's more context in our task tracker if y'all are interested.

Thanks for taking a look and feel free to ping here or in OpenSearch Slack if you have any questions or comments.

prudhvigodithi · 2025-10-22T16:38:54Z

Adding @patelsmit32123 @synhershko to please take a look and add your thoughts.

synhershko · 2025-10-23T07:07:32Z

We're rolling out a new OpenSearch environment on K8s in the next month or so and I was wondering if y'all had the cycles to review this change? There's more context in our task tracker if y'all are interested.

FWIW we are planning a massive release of a 3.0 version of this operator which will be significantly better and safer to use in production.

charts/opensearch-operator/Chart.yaml

synhershko · 2025-10-23T07:10:50Z

What is the use-case for doing what you are doing here? I might be missing something

cc @josedev-union

brouberol · 2025-10-23T07:58:56Z

What is the use-case for doing what you are doing here?

The use case is mostly deployment convenience (only having to deploy a single operator cluster-wide), as well as aligning with common operator behavior within the Kubernetes ecosystem.

For example, having a single operator being able to watch multiple operators is supported by:

Note: We're not actively running all of these operators (only a subset), but I sampled actively maintained operator codebases and documentation to showcase that this is a common behavior.

My point (and IMHO the general sentiment over at #374) is that this behavior is expected by operator users and deployers, as it's become quite standard.

It is something that is natively supported by the operator SDK, and does not take anything away from the current operator behavior, as it only adds the ability to have the operator manage one to many clusters, instead of a single one at the moment.

I hope it clears things up.

Note:: I just realized that for this patch to be complete, it's lacking iterating over watched namespaces to setup the appropriate roles and role bindings in each of them. I'm happy to send that work over if the feature request intention is approved.

synhershko · 2025-10-24T12:57:06Z

Got it. Happy to merge this once conflicts are resolved and CI is green.

josedev-union

First, please resolve conflicts.

Second,

opensearch-k8s-operator/charts/opensearch-operator/values.yaml

Lines 119 to 123 in ccd97c6

    
           ## If this is set to true, RoleBindings will be used instead of ClusterRoleBindings, inorder to restrict ClusterRoles 
        
           ## to the namespace where the operator and OpenSearch cluster are in. In that case, specify the namespace where they 
        
           ## are in in manager.watchNamespace field. 
        
           ## If false, ClusterRoleBindings will be used 
        
           useRoleBindings: false

This is not directly linked to your changes but good to have in the same scope.
useRoleBindings should be enabled only when watchNamespace is equal to the release namespace. If not, then we need to use clusterRoleBindings.
Please update the helm doc of useRoleBindings part properly and fix typo in the current one.

josedev-union

There's a critical issue when watchNamespace is not set (empty string).

opensearch-operator/main.go

We keep the original `-watch-namespace` flag, to ensure backwards compatibility. We simply split the value over any comma, and populate the cache for each namespace in the csv. Note: Because the `watchNamespace` variable was being tested for emptiness _before_ `flag.Parse()` was being called, it was always empty, causing the operator to _always_ watch all namespaces in the cluster. This is no longer the case. Fixes opensearch-project#374 Signed-off-by: Balthazar Rouberol <[email protected]>

brouberol · 2025-10-29T08:16:11Z

@josedev-union I'm confused by the useRoleBindings required change.

The chart defines 2 ClusterRoles:

{{ include "opensearch-operator.fullname" . }}-{{ .Release.Namespace }}-proxy-role
{{ include "opensearch-operator.fullname" . }}-{{ .Release.Namespace }}-manager-role

If we define cluster.watchNamespace to, say, [ns1, ns2], we probably want to use RoleBindings to bing each of these ClusterRoles, scoped in each of these 2 namespaces, don't we? The alternative would be to use a ClusterRoleBinding which would allow the opensearch operator access to, amongst other things, access to Secret resources in all namespaces.

Cf https://kubernetes.io/docs/reference/access-authn-authz/rbac/#rolebinding-and-clusterrolebinding

A RoleBinding may reference any Role in the same namespace. Alternatively, a RoleBinding can reference a ClusterRole and bind that ClusterRole to the namespace of the RoleBinding. If you want to bind a ClusterRole to all the namespaces in your cluster, you use a ClusterRoleBinding.

(emphasis mine)

The way I see if, the logic should be to use RoleBindings as soon as manager.watchNamespace is non-empty. If it is, then the controller would watch all namespaces, and thus would require a ClusterRoleBinding.

WDYT?

josedev-union · 2025-10-29T08:31:43Z

This is not directly linked to your changes but good to have in the same scope.
useRoleBindings should be enabled only when watchNamespace is equal to the release namespace. If not, then we need to use clusterRoleBindings.
Please update the helm doc of useRoleBindings part properly and fix typo in the current one.

yes, we need to use clusterRoleBinding in such cases.
What i requested to change is to update helm docs in values file to clearly explain that.
In the current helm docs, it is incorrect. We can just use RoleBinding only when the watchNamespace is a single one and also it is equal to the release namespace. But now it is written that we can use RoleBinding just when watchNamespace is specified.

brouberol · 2025-10-29T08:40:41Z

So, this is where I'm not sure I agree (or maybe we agree and I'm just misunderstanding).

If you watch specific namespaces, by having a non-empty cluster.watchNamespace, then the chart should use a RoleBinding to bind the operator ClusterRole, in each of the watched namespaces.

The RoleBinding / ClusterRoleBinding decision should be:

flowchart TD
    B{Is it manager.watchNamespace empty?}
    B -- Yes --> C[Use a ClusterRoleBinding to give permissions to the operator on *all* namespaces]
    B -- No ----> D[Use a RoleBinding on the operator ClusterRoleBinding in each of the watched namespaces]

Happy to hear your thoughts

josedev-union · 2025-10-29T08:58:16Z

@brouberol nah, it is much more than my original thought. :)
#1101 (review)
I just recommend you to update comments in the helm chart values file only like this.

## If this is set to true, RoleBindings will be used instead of ClusterRoleBindings, inorder to restrict ClusterRoles 
 ## to the namespace where the operator and OpenSearch cluster are in.
 ## You need to set the release namespace as manager.watchNamespace
 useRoleBindings: false

brouberol · 2025-10-29T09:27:34Z

Ok, I think I pinpointed where our lack of understanding is coming from. Let me know if I get this right.

The opensearch operator is usually deployed in the same namespace than the opensearch cluster. When that is the case, we can use a RoleBinding because the Role will be part of the same namespace as the cluster.
When that is not the case, the current chart design is to use ClusterRole for the operator and a ClusterRoleBinding to grant permission to the operator to all namespaces.

In that light, I understand your comment: having useRoleBindings: true would only work if {{ .Release.Namespace }} == {{ $watchedNamespace }}. If this is how the chart wants things to work, then sure, I'll update the comment.

What I was pushing towards though, is considering that the operator can be run from its own namespace (as it is common in the ecosystem to run operator wither from kube-system or dedicated namespaces, such as opensearch-operator in our case). The operator would define its permissions via a ClusterRole, and the operator permissions would be bound to the watched namespaces only via a ClusterRoleBinding in each of these namespaces (in the following diagram, [ns1, ns2].

classDiagram

    ClusterRole <|-- RoleBindingNS1
    ClusterRole <|-- RoleBindingNS2
    
    class ClusterRole {
        name: opensearch-operator-manager-role
        permissions: [...]
    }
    class RoleBindingNS1 {
        namespace: ns1
        ---
        roleRef.apiGroup: rbac.authorization.k8s.io
        roleRef.kind: ClusterRole
        roleRef.name: opensearch-operator-manager-role
        subjects[0].kind: ServiceAccount
        subjects[0].name: opensearch-operator
        subjects[0].namespace: opensearch-operator
    }
    class RoleBindingNS2 {
        namespace: ns2
        ---
        roleRef.apiGroup: rbac.authorization.k8s.io
        roleRef.kind: ClusterRole
        roleRef.name: opensearch-operator-manager-role
        subjects[0].kind: ServiceAccount
        subjects[0].name: opensearch-operator
        subjects[0].namespace: opensearch-operator    }

The main reason to do this would be to avoid granting the opensearch-operator to read/write/delete/update Secret resources in all namespaces, even those it is not watching, which is as large of a security risk as you can get.

I'm happy to get told "let's punt this to another PR" and I'll just update the comment. However, let's be aware than it its current state, the operator permissions are wide open.

Linking back to #374 original message, we see

For security reasons we are not able to use the clusterrolebinding and have to use namespaced rolebindings instead. This means we can only use the watchNamespace mode of deployment which obviously means that we would have to install multiple operators for multiple clusters in separate namespaces.

This is the security issue they're mentioning.

josedev-union · 2025-11-03T14:53:07Z

Ok, I think I pinpointed where our lack of understanding is coming from. Let me know if I get this right.

The opensearch operator is usually deployed in the same namespace than the opensearch cluster. When that is the case, we can use a RoleBinding because the Role will be part of the same namespace as the cluster. When that is not the case, the current chart design is to use ClusterRole for the operator and a ClusterRoleBinding to grant permission to the operator to all namespaces.

In that light, I understand your comment: having useRoleBindings: true would only work if {{ .Release.Namespace }} == {{ $watchedNamespace }}. If this is how the chart wants things to work, then sure, I'll update the comment.

What I was pushing towards though, is considering that the operator can be run from its own namespace (as it is common in the ecosystem to run operator wither from kube-system or dedicated namespaces, such as opensearch-operator in our case). The operator would define its permissions via a ClusterRole, and the operator permissions would be bound to the watched namespaces only via a ClusterRoleBinding in each of these namespaces (in the following diagram, [ns1, ns2].
classDiagram

    ClusterRole <|-- RoleBindingNS1
    ClusterRole <|-- RoleBindingNS2
    
    class ClusterRole {
        name: opensearch-operator-manager-role
        permissions: [...]
    }
    class RoleBindingNS1 {
        namespace: ns1
        ---
        roleRef.apiGroup: rbac.authorization.k8s.io
        roleRef.kind: ClusterRole
        roleRef.name: opensearch-operator-manager-role
        subjects[0].kind: ServiceAccount
        subjects[0].name: opensearch-operator
        subjects[0].namespace: opensearch-operator
    }
    class RoleBindingNS2 {
        namespace: ns2
        ---
        roleRef.apiGroup: rbac.authorization.k8s.io
        roleRef.kind: ClusterRole
        roleRef.name: opensearch-operator-manager-role
        subjects[0].kind: ServiceAccount
        subjects[0].name: opensearch-operator
        subjects[0].namespace: opensearch-operator    }
Loading
The main reason to do this would be to avoid granting the opensearch-operator to read/write/delete/update Secret resources in all namespaces, even those it is not watching, which is as large of a security risk as you can get.

I'm happy to get told "let's punt this to another PR" and I'll just update the comment. However, let's be aware than it its current state, the operator permissions are wide open.

Linking back to #374 original message, we see

For security reasons we are not able to use the clusterrolebinding and have to use namespaced rolebindings instead. This means we can only use the watchNamespace mode of deployment which obviously means that we would have to install multiple operators for multiple clusters in separate namespaces.

This is the security issue they're mentioning.

I prefer to open a new issue for this.
Normally, operators have a boolean flag watchGlobal and they just switch between ClusterRoleBinding and RoleBinding. In this case, the current implementation is ok. But now we specify the list of namespaces, so need to revist this topic to follow PoLP.

brouberol requested review from idanl21, jochenkressin, pchmielnik, prudhvigodithi, salyh and swoehrl-mw as code owners September 17, 2025 14:50

github-project-automation bot added this to Engineering Effectiveness Board Sep 17, 2025

brouberol force-pushed the watch-multiple-ns branch from 710693f to 4cca417 Compare September 17, 2025 14:53

inflatador mentioned this pull request Oct 1, 2025

[BUG] useRoleBindings=true in operator v2.8.0 causes cluster-scoped resource watch/list failures #1102

Open

synhershko reviewed Oct 23, 2025

View reviewed changes

charts/opensearch-operator/Chart.yaml Outdated Show resolved Hide resolved

josedev-union suggested changes Oct 27, 2025

View reviewed changes

github-project-automation bot moved this to 👀 In Review in Engineering Effectiveness Board Oct 27, 2025

josedev-union reviewed Oct 27, 2025

View reviewed changes

opensearch-operator/main.go Outdated Show resolved Hide resolved

brouberol force-pushed the watch-multiple-ns branch from 4cca417 to e11f38f Compare October 29, 2025 08:16

josedev-union approved these changes Nov 3, 2025

View reviewed changes

synhershko approved these changes Nov 3, 2025

View reviewed changes

synhershko merged commit 408c100 into opensearch-project:main Nov 3, 2025
11 checks passed

github-project-automation bot moved this from 👀 In Review to ✅ Done in Engineering Effectiveness Board Nov 3, 2025

	## If this is set to true, RoleBindings will be used instead of ClusterRoleBindings, inorder to restrict ClusterRoles
	## to the namespace where the operator and OpenSearch cluster are in. In that case, specify the namespace where they
	## are in in manager.watchNamespace field.
	## If false, ClusterRoleBindings will be used
	useRoleBindings: false

Allow the opensearch operator to watch multiple namespaces #1101

Allow the opensearch operator to watch multiple namespaces #1101

Conversation

brouberol commented Sep 17, 2025

Description

Testing

Chart changes

Issues Resolved

Check List

Uh oh!

inflatador commented Sep 26, 2025

Uh oh!

prudhvigodithi commented Oct 22, 2025

Uh oh!

synhershko commented Oct 23, 2025

Uh oh!

Uh oh!

synhershko commented Oct 23, 2025

Uh oh!

brouberol commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

synhershko commented Oct 24, 2025

Uh oh!

josedev-union left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josedev-union left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brouberol commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josedev-union commented Oct 29, 2025

Uh oh!

brouberol commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josedev-union commented Oct 29, 2025

Uh oh!

brouberol commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josedev-union commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

brouberol commented Oct 23, 2025 •

edited

Loading

josedev-union left a comment •

edited

Loading

brouberol commented Oct 29, 2025 •

edited

Loading

brouberol commented Oct 29, 2025 •

edited

Loading

brouberol commented Oct 29, 2025 •

edited

Loading